1 Getting started

The MetaboDiff R package can be installed via Github and requires R version >= 3.4.

1.1 Installing dependencies

CRAN occasionally fails to compile the WGCNA package for Mac OS X. Hence, we recommend to install the package before installing MetaboDiff:

install.packages("WGCNA")

If asked, install the package from source.

Alternatively you might need to download the package from the developer
and install the package locally:

install.packages(path_to_file, repos = NULL, type="source")

If you encounter problems installing WGCNA, please refer to the developer page.

Please note that MetaboDiff can only be installed if WGCNA was successfully installed.

1.2 Installing MetaboDiff

MetaboDiff can be installed via Github

library("devtools")
install_github("andreasmock/MetaboDiff")

and once installed loaded by

library("MetaboDiff")

2 Part I: Data processing

2.1 Reading data and annotation

2.1.1 Example data

The example data is derived from a study by Priolo and colleagues in which they used the service of Metabolon® to compare the tissue metabolome of 40 prostate cancers with 16 normal prostate specimens1 Priolo, C., Pyne, S., Rose, J., Regan, E. R., Zadra, G., Photopoulos, C., et al. (2014). AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer. Cancer Research, 74(24), 7198–7204. http://doi.org/10.1158/0008-5472.CAN-14-1490.

# load example data
data = as.matrix(read.csv(system.file("extdata", 
                                      "Priolo_data.csv", 
                                      package = "MetaboDiff")))

2.1.2 Input files

The metabolomic data within MetaboDiff is stored as a MultiAssayExperiment class2 Sig M (2017). MultiAssayExperiment: Software for the integration of multi-omics experiments in Bioconductor. R package version 1.2.1. This framework enables the coordinated representation of multiple experiments on partially overlapping samples with associated metadata and integrated subsetting across experiments. In the context of metabolomic data analysis, multiple assays are needed to store raw data and imputed data which usually contain different number of metabolites due to missing values (see section on data imputation for more details).

The core components of the MultiAssayExperiment class are:

  • ExperimentList - a slot of class ExperimentList containing data for each experimental assay. Within the ExperimentList slot, the metabolomic data are stored as
    SummarizedExperiment objects consisting of:
    • assay - a matrix containing the relative metabolic measurements.
    • rowData - a dataframe containing the metabolite annotation.
  • colData - a slot of class dataframe describing the sample metadata available across all experiments.
  • sampleMap - a slot of class dataframe relating clinical data to experimental assay.
  • metadata - a slot of class list. Within MetaboDiff, this slot contains a list of dataframes summarizing the results from the comparative data analysis.

2.1.2.1 Creation of assay object containing metabolomic data

assay = apply(data[4:nrow(data),8:ncol(data)],2,as.numeric)
colnames(assay) = paste0(rep("sample",ncol(assay)),1:ncol(assay))
rownames(assay) = paste0(rep("met",nrow(assay)),1:nrow(assay))
knitr::kable(assay[1:4,1:5])
sample1 sample2 sample3 sample4 sample5
met1 33964.73 117318.43 118856.90 78670.7 102565.94
met2 18505.56 167585.32 59621.97 66220.4 74892.27
met3 NA 42373.93 27141.21 NA 38390.78
met4 61638.77 74595.78 NA NA NA

2.1.2.2 Creation of colData object containing sample metadata

colData = data.frame(id = colnames(data[,8:ncol(data)]),
                     tumor_normal = as.vector(t(data[1,8:ncol(data)])),
                   row.names=paste0(rep("pat",ncol(assay)),1:ncol(assay)))

To showcase the full functionality of MetaboDiff, we generate a second random group label “random_gender”.

#set seed for reproducibility 
set.seed(1)

colData$random_gender = sample(c("random_male","random_female"),size = nrow(colData),replace = TRUE)
knitr::kable(head(colData))
id tumor_normal random_gender
pat1 cp2 N random_male
pat2 cp7 N random_male
pat3 cp19 N random_female
pat4 cp26 N random_female
pat5 cp29 N random_male
pat6 cp32 N random_female

2.1.2.3 Creation of rowData object containing metabolite annotations and ids

For the rowData object we start off with the annotation already provided by the commercial vendor:

rowData = as.data.frame(data[4:nrow(data),1:7])
colnames(rowData) = data[3,1:7]
colnames(rowData)[7] = "HMDB_ID"
rownames(rowData) = paste0(rep("met",nrow(assay)),1:nrow(assay))
colnames(rowData)
## [1] "BIOCHEMICAL"   "SUPER_PATHWAY" "SUB_PATHWAY"   "METABOLON_ID" 
## [5] "PLATFORM"      "KEGG_ID"       "HMDB_ID"

2.1.2.4 Annotation using Small Molecular Pathway Database (SMPDB)

Alongside the metabolic measurements, Metabolon® provides metabolite annotation including so called super-pathways and sub-pathways. However, theses annotations might very from vendor to vendor. Hence, the MetaboDiff package makes use of the Small Molecular Pathway Database (SMPDB) for metabolite annotation3 Jewison T, Su Y, Disfany FM, et al. SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database Nucleic Acids Res. 2014 Jan;42(Database issue):D478-84.. MetaboDiff supports all common metabolic IDs as input for the annotation (HMDB, KEGG and ChEBI). In the example data, both the KEGG and the HMDB identifier are available. As the databases (HMDB, KEGG or ChEBI) cover unique annotations due to different standards in identifying and reporting metabolites4 Thiele, I., Swainston, N., Fleming, R. M. T., Hoppe, A., Sahoo, S., Aurich, M. K., et al. (2013). A community-driven global reconstruction of human metabolism. Nature Biotechnology, 31(5), 419–425. http://doi.org/10.1038/nbt.2488, the function get_SMPDBanno queries the database using all three ids (if available) and joins all available information. The current SMPDB build used within MetaboDiff is version 2.0 and will be updated as new versions are released. Please refer to the development branch of MetaboDiff to work with the latest SMPDB annotation.

rowData = get_SMPDBanno(rowData,
                        column_kegg_id=6,
                        column_hmdb_id=7,
                        column_chebi_id=NA)

2.1.3 Creation of SummarizedExperiment object and ExperimentList

se = SummarizedExperiment(assays=assay,
                           rowData=rowData)
experiment_list = list(raw=se)
experiment_list
## $raw
## class: SummarizedExperiment 
## dim: 307 86 
## metadata(0):
## assays(1): ''
## rownames(307): met1 met2 ... met306 met307
## rowData names(22): BIOCHEMICAL SUPER_PATHWAY ... SMPDB|InChI
##   SMPDB|InChI.Key
## colnames(86): sample1 sample2 ... sample85 sample86
## colData names(0):

2.1.4 Creation of sampleMap

sampleMap = data.frame(primary=rownames(colData),
                 colname=colnames(se))
sampleMap_list = listToMap(list(raw=sampleMap))

2.1.5 Construction of MultiAssayExperiment

met = MultiAssayExperiment(experiments = experiment_list,
                           colData = colData,
                           sampleMap = sampleMap_list)
met
## A MultiAssayExperiment object of 1 listed
##  experiment with a user-defined name and respective class. 
##  Containing an ExperimentList class object of length 1: 
##  [1] raw: SummarizedExperiment with 307 rows and 86 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert ExperimentList into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

2.2 Imputation of missing values

In contrast to other high-throughput technologies, missing values are common in quantitative metabolomic datasets.

The function na_heatmap visualizes the missing metabolite measurements across the samples. The name of the column in colData for grouping and the label colors for the two groups need to be specified.

na_heatmap(met,
           group_factor="tumor_normal",
           label_colors=c("darkseagreen","dodgerblue"))
Missing metabolic measurements across the example data set. Missing measurements are visualized by a binary heatmap and barplots summarizing the fraction of missing measurement per metabolite and sample, respectively. In addition the group label of interest (tumor (T) vs. normal (N)) is visualized. Metabolite measurements are only imputed if less than 40 percent are missing.

Figure 1: Missing metabolic measurements across the example data set
Missing measurements are visualized by a binary heatmap and barplots summarizing the fraction of missing measurement per metabolite and sample, respectively. In addition the group label of interest (tumor (T) vs. normal (N)) is visualized. Metabolite measurements are only imputed if less than 40 percent are missing.

The example data supports the need for data imputation (figure 1). It could be shown that k-nearest neighbor imputation minimizes the effects on the normality and variance of the data as long as the number of missing data does not exceed 40%5 Armitage, E. G., Godzien, J., Alonso-Herranz, V., L pez-Gonz lvez, N., & Barbas, C. (2015). Missing value imputation strategies for metabolomics data. Electrophoresis, 36(24), 3050–3060. http://doi.org/10.1002/elps.201500352.

The function ‘knn_impute’ adds the slot “impute” to the MultiAssayExperiment object that contains the imputed relative metabolite measurements for all metabolites with raw measurements in more than 60% of cases. We recommend a cutoff of 40% (i.e. 0.4). However the cutoff might be changed according to the discretion of the user.

met = knn_impute(met,cutoff=0.4)
met
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] raw: SummarizedExperiment with 307 rows and 86 columns 
##  [2] imputed: SummarizedExperiment with 238 rows and 86 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert ExperimentList into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

As apparent form the summary description of the object ‘met’ the imputed data matrix contains only 238 of the 307 original metabolites according the cutoff of 40% missing values.

2.3 Removal of outliers

Before we normalize the data, we want to exclude outliers in the study set. To this end, we provide the function outlier_heatmap. The sample annotation shows the number of missing metabolites per sample as the proxy of the impact of imputation on clustering. To showcase outliers, the hierarchical clustering tree is cut in 2 clusters.

outlier_heatmap(met,
                group_factor="tumor_normal",
                label_colors=c("darkseagreen","dodgerblue"))
Hierarchical clustering of metabolite measurements. The tree is cut in 2 clusters. The user has the choice to exclude one cluster which he thinks might represent outliers. To determine the effect of imputation, a barplot displaying the fraction of missing metabolites is shown in the column annotation of the heatmap.

Figure 2: Hierarchical clustering of metabolite measurements
The tree is cut in 2 clusters. The user has the choice to exclude one cluster which he thinks might represent outliers. To determine the effect of imputation, a barplot displaying the fraction of missing metabolites is shown in the column annotation of the heatmap.

The imputed data of the example study set displays a cluster of 5 samples (cluster 1) with an in average lower relative metabolite abundance. Due to the lack of batch information, this cannot be investigted further at this time. To demonstrate, how a cluster can be removed, we apply the function remove_cluster to remove cluster 1:

met = remove_cluster(met,cluster=1)
met
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] raw: SummarizedExperiment with 307 rows and 81 columns 
##  [2] imputed: SummarizedExperiment with 238 rows and 81 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert ExperimentList into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

As displayed in the summary of the met object, the 5 samples of cluster 1 were successfully removed from the slots “raw” and “imputed”.

2.4 Normalization

Variance stabilizing normalization (vsn) is used ensure that the variance remains nearly constant over the measured spectrum (Huber et al., 2002).

met = normalize_met(met)

2.5 Summary of processed object

met
## A MultiAssayExperiment object of 4 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 4: 
##  [1] raw: SummarizedExperiment with 307 rows and 81 columns 
##  [2] imputed: SummarizedExperiment with 238 rows and 81 columns 
##  [3] norm: SummarizedExperiment with 307 rows and 81 columns 
##  [4] norm_imputed: SummarizedExperiment with 238 rows and 81 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert ExperimentList into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

At this point the data processing is completed. The MultiAssayExperiment object contains now 4 slots:

  • raw - raw relative metabolic measurements as provided by company or core facility
  • imputed - imputed relative metabolic measurements (k-nearest neighbor imputation)
  • norm - normalized relative metabolic measurements (vsn)
  • norm_imputed - normalized imputed relative metabolic measurements (vsn)

2.6 Quality control of normalization

quality_plot(met,
             group_factor="tumor_normal",
             label_colors=c("darkseagreen","dodgerblue"))
Quality control plot. Boxplot displaying the distribution of (A) raw, (B) imputed (C) normalized and (D) imputed and normalized metabolite measurements for all samples of the study set. Boxplot are colored according to the grouping of interest, i.e. tumor vs normal.

Figure 3: Quality control plot
Boxplot displaying the distribution of (A) raw, (B) imputed (C) normalized and (D) imputed and normalized metabolite measurements for all samples of the study set. Boxplot are colored according to the grouping of interest, i.e. tumor vs normal.

3 Part II: Data analysis

Part II of MetaboDiff contains a set of functions for comparative data analysis that are purpose-built for the preprocessed MultiAssayExperiment object. All statistics obtained in this part will be saved in the metadata slot of the object.

3.1 Principal component analysis (PCA)

metadata(met)$res_pca = FactoMineR::PCA(t(assays(met)[["norm_imputed"]]), 
              scale.unit = TRUE,
              graph=FALSE)

Visualize percentage of explained variances for the first ten principal components i.e. dimensions.

factoextra::fviz_screeplot(metadata(met)$res_pca , addlabels=TRUE)
Variance explained by principal components. Barplot for principal components one to ten are displayed.

Figure 4: Variance explained by principal components
Barplot for principal components one to ten are displayed.

In the example data, principal component 1 explains 17.1% and principal component 2 9.4% of variance. The following function creates the PCA plot for the grouping of interest.

factoextra::fviz_pca_ind(metadata(met)$res_pca,
             label="none",
             habillage=colData(met)[["tumor_normal"]],
             palette=c("darkseagreen","dodgerblue"),
             addEllipses = TRUE)
PCA plot. The samples are colored according to the grouping of interest.

Figure 5: PCA plot
The samples are colored according to the grouping of interest.

3.2 Hypothesis testing

Differential analysis for individual metabolites is performed using Student’s T-Test. Correction for multiple testing is performed by independent hypothesis weighting (IHW6 Ignatiadis, N., Klaus, B., Zaugg, J. B., & Huber, W. (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13(7), 577–580. http://doi.org/10.1038/nmeth.3885) with variance as a covariate.

Hypothesis testing will be performed for the sample grouping “tumor_normal”, as well as for the randomly generated grouping “random_gender”.

met = diff_test(met,
                group_factors = c("tumor_normal","random_gender"))

The results of the hypothesis testing is saved in the metadata splot of the MultiAssayExperiment object.

str(metadata(met), max.level=2)
## List of 2
##  $ ttest_tumor_normal :'data.frame': 238 obs. of  4 variables:
##   ..$ pval       : num [1:238] 0.0206 0.7808 0.0832 0.0432 0.5859 ...
##   ..$ adj_pval   : num [1:238] 0.18 0.911 0.219 0.158 0.716 ...
##   ..$ fold_change: num [1:238] 0.2872 0.0366 -0.3936 -0.5391 -0.1646 ...
##   ..$ var        : num [1:238] 0.264 0.287 0.872 1.21 1.516 ...
##  $ ttest_random_gender:'data.frame': 238 obs. of  4 variables:
##   ..$ pval       : num [1:238] 0.2318 0.8626 0.4048 0.0121 0.2111 ...
##   ..$ adj_pval   : num [1:238] 0.83 0.959 0.862 0.386 0.83 ...
##   ..$ fold_change: num [1:238] 0.1372 0.0208 -0.1742 -0.607 -0.3438 ...
##   ..$ var        : num [1:238] 0.264 0.287 0.872 1.21 1.516 ...

The columns of the result data frame are the unadjusted p-value (pval), the adjusted p-value by independent hypothesis weighting (adj_pval), the Fold-Change between groups (fold_change) and the variance (var).

The comparative analysis is visualized by means of a volcano plot.

volcano_plot(met, 
             group_factor="tumor_normal",
             label_colors=c("darkseagreen","dodgerblue"))

As a sanity check, we also display the volcano plot for the random grouping “random_gender” for which we would not expect the same number of significant metabolites.

volcano_plot(met, 
             group_factor="random_gender",
             label_colors=c("brown","orange"))

As expected, no metabolite was significantly different between the randomly assigned grouping male vs. female after multiple testing with a cutoff of p_adjust < 0.05.

3.3 Pathway analysis

In this part of the analysis, we will exploit the Small molecular Pathway Database (SMPDB) annotation to explore metabolic pathways and perform an enrichment analysis.

The annotations from the SMPDB has the prefix “SMPDB|”. The remaining annotation was provided by the commercial vendor that generated the example data (i.e. Metabolon®).

colnames(rowData(met[["norm_imputed"]]))
##  [1] "BIOCHEMICAL"           "SUPER_PATHWAY"         "SUB_PATHWAY"          
##  [4] "METABOLON_ID"          "PLATFORM"              "KEGG_ID"              
##  [7] "HMDB_ID"               "SMPDB|SMPDB.ID"        "SMPDB|Pathway.Name"   
## [10] "SMPDB|Pathway.Subject" "SMPDB|Metabolite.ID"   "SMPDB|Metabolite.Name"
## [13] "SMPDB|HMDB.ID"         "SMPDB|KEGG.ID"         "SMPDB|ChEBI.ID"       
## [16] "SMPDB|DrugBank.ID"     "SMPDB|CAS"             "SMPDB|Formula"        
## [19] "SMPDB|IUPAC"           "SMPDB|SMILES"          "SMPDB|InChI"          
## [22] "SMPDB|InChI.Key"

3.3.1 Variance boxplots

For instance, this annotation can be used to explore the variance in pathways across the data set.

variance_boxplot(met,
                 rowAnnotation = "SMPDB|Pathway.Name")

MetaboDiff refrains from ‘classical’ pathway enrichment analyses, as they are limited by the need to define a priori metabolite sets for evaluation and ignore the topology of interactions. Hence, MetaboDiff offers the analysis of metabolic correlation networks.

3.4 Metabolic correlation network analysis

This section describes the generation and analysis of metabolic correlation networks. The workflow was adapted from the weighted gene co-expression analysis (WGCNA) proposed by Langfelder and Horvarth7 Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559–559. http://doi.org/10.1186/1471-2105-9-559.

3.4.1 Construction dissiumilarity matrix

The first step in constructing a metabolic correlation network is the creation of a dissimilarity matrix. Biweight midcorrelation was used as a similiarity measure as it is more robust to outliers than the absolute correlation coefficient8 Zheng, C.-H., Yuan, L., Sha, W., & Sun, Z.-L. (2014). Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinformatics, 15 Suppl 15(Suppl 15), S3. http://doi.org/10.1186/1471-2105-15-S15-S3. This choice is important, as we do not expect metabolites to be correlated in all patients.

The core concept of the so called “weighted” correlation analysis by Langfelder and Horvarth is that instead of defining a “hard” threshold (e.g. an absolute correlation coefficient > 0.8) to decide whether a node as connected to another, the adjacency a is defined by raising the similarity s to a power beta (‘soft’ threshold):

\[\begin{equation} a_{ij} = s_{ij}^\beta \end{equation}\]

Lastly, the dissimilarity measure is defined by

\[\begin{equation} w_{ij} = 1 - a_{ij} \end{equation}\]

For detailed rational of this approach, please see Zhang and Horvath9 Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17. http://doi.org/10.2202/1544-6115.1128. For metabolic networks, we identified that a beta value of 3 was the lowest power for which the scale-free topology of the topology was met.

The function diss_matrix creates the dissimilarity measure for the met objects and saves it in the metadata slot

met = diss_matrix(met)
str(metadata(met), max.level=2)
## List of 3
##  $ ttest_tumor_normal :'data.frame': 238 obs. of  4 variables:
##   ..$ pval       : num [1:238] 0.0206 0.7808 0.0832 0.0432 0.5859 ...
##   ..$ adj_pval   : num [1:238] 0.18 0.911 0.219 0.158 0.716 ...
##   ..$ fold_change: num [1:238] 0.2872 0.0366 -0.3936 -0.5391 -0.1646 ...
##   ..$ var        : num [1:238] 0.264 0.287 0.872 1.21 1.516 ...
##  $ ttest_random_gender:'data.frame': 238 obs. of  4 variables:
##   ..$ pval       : num [1:238] 0.2318 0.8626 0.4048 0.0121 0.2111 ...
##   ..$ adj_pval   : num [1:238] 0.83 0.959 0.862 0.386 0.83 ...
##   ..$ fold_change: num [1:238] 0.1372 0.0208 -0.1742 -0.607 -0.3438 ...
##   ..$ var        : num [1:238] 0.264 0.287 0.872 1.21 1.516 ...
##  $ diss_matrix        : num [1:238, 1:238] 0 0.964 0.999 0.999 0.996 ...
##   ..- attr(*, "dimnames")=List of 2

3.4.2 Identification of metabolic correlation modules

To identify metabolic correlation modules, metabolites are next clustered based on the dissimilarity measure, where branches of the dendrogram correspond to modules. Ultimately, modules are detected by applying a branch cutting method. We employed the dynamic branch cut method developed by Langfelder and colleagues10 Langfelder, P., Zhang, B., & Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics, 24(5), 719–720. http://doi.org/10.1093/bioinformatics/btm563., as constant height cutoffs exhibit suboptimal performance on complicated dendrograms.

met = identify_modules(met, 
                       min_module_size=5)
##  ..cutHeight not given, setting it to 0.991  ===>  99% of the (truncated) height range in dendro.
##  ..done.
WGCNA::plotDendroAndColors(metadata(met)$tree, 
                    metadata(met)$module_color_vector, 
                    'Module colors', 
                    dendroLabels = FALSE, 
                    hang = 0.03,
                    addGuide = TRUE, 
                    guideHang = 0.05, main='')

The relation between the identified co-expression modules can be visualized by a dendrogram of their eigengenes. The module eigengene is defined as the first principal component of its expression matrix. It could be shown that the module= eigengene is highly correlated with the gene that has the highest intramodular connectivity11 Horvath, S., & Dong, J. (2008). Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Computational Biology (PLOSCB) 4(8), 4(8), e1000117–e1000117. http://doi.org/10.1371/journal.pcbi.1000117.

par(mar=c(2,2,2,2))
ape::plot.phylo(ape::as.phylo(metadata(met)$METree),
           type = 'fan',
           show.tip.label = FALSE, 
           main='')
ape::tiplabels(frame = 'circle',
          col='black', 
          text=rep('',length(unique(metadata(met)$modules))), 
          bg = WGCNA::labels2colors(0:21))

par(mar=c(2,2,2,2))
ape::plot.phylo(ape::as.phylo(metadata(met)$METree),
           type = 'fan',
           show.tip.label = TRUE, 
           main='')

# number of metabolites per module
table(metadata(met)$modules)
## 
##  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 
##  2 30 22 20 15 14 14 13 12 11 10  9  9  8  8  8  7  6  5  5  5  5

3.4.3 Name metabolic correlation modules

#the names are saved in this slot
met = name_modules(met,
                   pathway_annotation = "SUB_PATHWAY")

# plot phylogram with names
ape::plot.phylo(ape::as.phylo(metadata(met)$METree))

3.4.4 Relation of metabolic correlation modules to sample traits

An advantage of co-expression network analysis is the possibility to integrate external information. At the lowest hierarchical level, gene significance (GS) measures can be defined as the statistical significance (i.e. p-value, \(p_i\)) between the \(i\)-th node profile (gene) \(x_i\) and the sample trait \(T\)

\[\begin{equation} GS_i = -log~p_i \end{equation}\]

Module significance in turn can be determined as the average absolute gene significance measure. This conceptual framework can be adapted to any research question.

met = calculate_MS(met,
                   group_factors = c("tumor_normal","random_gender"))
str(metadata(met), max.level=1)
## List of 13
##  $ ttest_tumor_normal :'data.frame': 238 obs. of  4 variables:
##  $ ttest_random_gender:'data.frame': 238 obs. of  4 variables:
##  $ diss_matrix        : num [1:238, 1:238] 0 0.964 0.999 0.999 0.996 ...
##   ..- attr(*, "dimnames")=List of 2
##  $ tree               :List of 7
##   ..- attr(*, "class")= chr "hclust"
##  $ modules            : num [1:238] 3 5 13 1 11 1 11 11 13 1 ...
##  $ module_color_vector: chr [1:238] "brown" "green" "salmon" "turquoise" ...
##  $ MEs                :'data.frame': 81 obs. of  22 variables:
##  $ MEDiss             : num [1:22, 1:22] 0 1.038 1.073 1.055 0.973 ...
##   ..- attr(*, "dimnames")=List of 2
##  $ METree             :List of 7
##   ..- attr(*, "class")= chr "hclust"
##  $ module_colors      : chr [1:22] "yellow" "lightgreen" "grey" "greenyellow" ...
##  $ MM                 : num [1:238, 1:22] 0.32992 0.00722 0.09016 0.00808 0.12118 ...
##   ..- attr(*, "dimnames")=List of 2
##  $ MS_tumor_normal    :'data.frame': 22 obs. of  2 variables:
##  $ MS_random_gender   :'data.frame': 22 obs. of  2 variables:

Plot of module significance

MS_plot(met,
        group_factor="tumor_normal")

MS_plot(met,
        group_factor="random_gender")

3.4.5 Explore module of interest

To explore the role of metabolites within a correlation module, Langfelder and Horvath suggest a ‘fuzzy’ measure of module membership defined as

\[\begin{equation} K^q = |cor(x_i,E^q)| \end{equation}\]

where \(x_i\) is the profile of gene \(i\) and \(E^q\) is the eigengene of module \(q\). Based on this definition, \(K\) describes how closely related metabolite \(i\) is to module \(q\). A meaningful visualization is consequently plotting the module membership over the p-value of the respective GS measure. As a third dimension, the dot-size is weighted according to the effect size, i.e. fold-change between the grouping of interest.

MOI_plot(met,
         group_factor="tumor_normal",
         MOI=2)

Session information

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.5
## 
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] MetaboDiff_0.1.0           ggplot2_2.2.1             
##  [3] ComplexHeatmap_1.14.0      SummarizedExperiment_1.6.3
##  [5] DelayedArray_0.2.4         matrixStats_0.52.2        
##  [7] Biobase_2.36.2             GenomicRanges_1.28.2      
##  [9] GenomeInfoDb_1.12.0        IRanges_2.10.2            
## [11] S4Vectors_0.14.3           BiocGenerics_0.22.0       
## [13] BiocStyle_2.4.1            bookdown_0.4              
## [15] MultiAssayExperiment_1.2.1
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.0         circlize_0.4.0          fastmatch_1.1-0        
##   [4] Hmisc_4.0-3             igraph_1.0.1            plyr_1.8.4             
##   [7] lazyeval_0.2.0          shinydashboard_0.6.1    splines_3.4.0          
##  [10] robust_0.4-18           lpsymphony_1.4.1        digest_0.6.12          
##  [13] foreach_1.4.3           BiocInstaller_1.26.0    htmltools_0.3.6        
##  [16] viridis_0.4.0           GO.db_3.4.1             phytools_0.6-00        
##  [19] magrittr_1.5            checkmate_1.8.2         memoise_1.1.0          
##  [22] fit.models_0.5-14       cluster_2.0.6           doParallel_1.0.10      
##  [25] limma_3.32.2            fastcluster_1.1.22      annotate_1.54.0        
##  [28] colorspace_1.3-2        rrcov_1.4-3             blob_1.1.0             
##  [31] ggrepel_0.6.5           dplyr_0.5.0             jsonlite_1.5           
##  [34] RCurl_1.95-4.8          genefilter_1.58.1       impute_1.50.1          
##  [37] phangorn_2.2.0          ape_4.1                 survival_2.41-3        
##  [40] iterators_1.0.8         gtable_0.2.0            zlibbioc_1.22.0        
##  [43] XVector_0.16.0          UpSetR_1.3.3            GetoptLong_0.1.6       
##  [46] kernlab_0.9-25          shape_1.4.2             maps_3.2.0             
##  [49] prabclus_2.2-6          DEoptimR_1.0-8          scales_0.4.1           
##  [52] vsn_3.44.0              mvtnorm_1.0-6           DBI_0.7                
##  [55] IHW_1.4.0               Rcpp_0.12.11            plotrix_3.6-5          
##  [58] viridisLite_0.2.0       xtable_1.8-2            htmlTable_1.9          
##  [61] flashClust_1.01-2       foreign_0.8-68          bit_1.1-12             
##  [64] mclust_5.3              preprocessCore_1.38.1   Formula_1.2-1          
##  [67] animation_2.5           htmlwidgets_0.8         RColorBrewer_1.1-2     
##  [70] fpc_2.1-10              acepack_1.4.1           modeltools_0.2-21      
##  [73] factoextra_1.0.4        pkgconfig_2.0.1         XML_3.98-1.7           
##  [76] flexmix_2.3-14          nnet_7.3-12             dynamicTreeCut_1.63-1  
##  [79] labeling_0.3            rlang_0.1.1             reshape2_1.4.2         
##  [82] AnnotationDbi_1.38.1    munsell_0.4.3           tools_3.4.0            
##  [85] RSQLite_2.0             fdrtool_1.2.15          evaluate_0.10          
##  [88] stringr_1.2.0           yaml_2.1.14             knitr_1.16             
##  [91] bit64_0.9-7             robustbase_0.92-7       purrr_0.2.2.2          
##  [94] dendextend_1.5.2        nlme_3.1-131            whisker_0.3-2          
##  [97] mime_0.5                slam_0.1-40             leaps_3.0              
## [100] compiler_3.4.0          affyio_1.46.0           clusterGeneration_1.3.4
## [103] tibble_1.3.3            pcaPP_1.9-72            stringi_1.1.5          
## [106] highr_0.6               lattice_0.20-35         trimcluster_0.1-2      
## [109] Matrix_1.2-10           googleVis_0.6.2         msm_1.6.4              
## [112] combinat_0.0-8          GlobalOptions_0.0.12    data.table_1.10.4      
## [115] cowplot_0.7.0           bitops_1.0-6            httpuv_1.3.3           
## [118] R6_2.2.1                latticeExtra_0.6-28     affy_1.54.0            
## [121] gridExtra_2.2.1         codetools_0.2-15        MASS_7.3-47            
## [124] assertthat_0.2.0        rprojroot_1.2           rjson_0.2.15           
## [127] mnormt_1.5-5            GenomeInfoDbData_0.99.0 expm_0.999-2           
## [130] diptest_0.75-7          quadprog_1.5-5          rpart_4.1-11           
## [133] coda_0.19-1             tidyr_0.6.3             class_7.3-14           
## [136] rmarkdown_1.6           ggpubr_0.1.4            numDeriv_2016.8-1      
## [139] scatterplot3d_0.3-40    shiny_1.0.3             WGCNA_1.60             
## [142] base64enc_0.1-3         FactoMineR_1.36